{
 "cells": [
  {
   "cell_type": "raw",
   "id": "27598444",
   "metadata": {},
   "source": [
    "---\n",
    "sidebar_position: 3\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6e3f0f72",
   "metadata": {},
   "source": [
    "# How to return structured data from a model\n",
    "\n",
    "It is often useful to have a model return output that matches some specific schema. One common use-case is extracting data from arbitrary text to insert into a traditional database or use with some other downstrem system. This guide will show you a few different strategies you can use to do this.\n",
    "\n",
    ":::info Prerequisites\n",
    "\n",
    "This guide assumes familiarity with the following concepts:\n",
    "\n",
    "- [Chat models](/docs/concepts/#chat-models)\n",
    "\n",
    ":::\n",
    "\n",
    "## The `.withStructuredOutput()` method\n",
    "\n",
    "There are several strategies that models can use under the hood. For some of the most popular model providers, including [Anthropic](/docs/integrations/platforms/anthropic/), [Google VertexAI](/docs/integrations/platforms/google/), [Mistral](/docs/integrations/chat/mistral/), and [OpenAI](/docs/integrations/platforms/openai/) LangChain implements a common interface that abstracts away these strategies called `.withStructuredOutput`.\n",
    "\n",
    "By invoking this method (and passing in [JSON schema](https://json-schema.org/) or a [Zod schema](https://zod.dev/)) the model will add whatever model parameters + output parsers are necessary to get back structured output matching the requested schema. If the model supports more than one way to do this (e.g., function calling vs JSON mode) - you can configure which method to use by passing into that method.\n",
    "\n",
    "Let's look at some examples of this in action! We'll use Zod to create a simple response schema.\n",
    "\n",
    "```{=mdx}\n",
    "import ChatModelTabs from \"@theme/ChatModelTabs\";\n",
    "\n",
    "<ChatModelTabs onlyWso={true} />\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "070bf702",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{\n",
       "  setup: \u001b[32m\"Why don't cats play poker in the wild?\"\u001b[39m,\n",
       "  punchline: \u001b[32m\"Too many cheetahs.\"\u001b[39m,\n",
       "  rating: \u001b[33m7\u001b[39m\n",
       "}"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import { z } from \"zod\";\n",
    "\n",
    "const joke = z.object({\n",
    "    setup: z.string().describe(\"The setup of the joke\"),\n",
    "    punchline: z.string().describe(\"The punchline to the joke\"),\n",
    "    rating: z.number().optional().describe(\"How funny the joke is, from 1 to 10\"),\n",
    "});\n",
    "\n",
    "const structuredLlm = model.withStructuredOutput(joke);\n",
    "\n",
    "await structuredLlm.invoke(\"Tell me a joke about cats\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fe6efeab",
   "metadata": {},
   "source": [
    "One key point is that though we set our Zod schema as a variable named `joke`, Zod is not able to access that variable name, and therefore cannot pass it to the model. Though it is not required, we can pass a name for our schema in order to give the model additional context as to what our schema represents, improving performance:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "f3d01a1d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{\n",
       "  setup: \u001b[32m\"Why don't cats play poker in the wild?\"\u001b[39m,\n",
       "  punchline: \u001b[32m\"Too many cheetahs!\"\u001b[39m,\n",
       "  rating: \u001b[33m7\u001b[39m\n",
       "}"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "const structuredLlm = model.withStructuredOutput(joke, { name: \"joke\" });\n",
    "\n",
    "await structuredLlm.invoke(\"Tell me a joke about cats\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "deddb6d3",
   "metadata": {},
   "source": [
    "The result is a JSON object.\n",
    "\n",
    "We can also pass in an OpenAI-style JSON schema dict if you prefer not to use Zod. This object should contain three properties:\n",
    "\n",
    "- `name`: The name of the schema to output.\n",
    "- `description`: A high level description of the schema to output.\n",
    "- `parameters`: The nested details of the schema you want to extract, formatted as a [JSON schema](https://json-schema.org/) dict.\n",
    "\n",
    "In this case, the response is also a dict:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "6700994a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{\n",
       "  setup: \u001b[32m\"Why was the cat sitting on the computer?\"\u001b[39m,\n",
       "  punchline: \u001b[32m\"Because it wanted to keep an eye on the mouse!\"\u001b[39m\n",
       "}"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "const structuredLlm = model.withStructuredOutput(\n",
    "    {\n",
    "        \"name\": \"joke\",\n",
    "        \"description\": \"Joke to tell user.\",\n",
    "        \"parameters\": {\n",
    "            \"title\": \"Joke\",\n",
    "            \"type\": \"object\",\n",
    "            \"properties\": {\n",
    "                \"setup\": {\"type\": \"string\", \"description\": \"The setup for the joke\"},\n",
    "                \"punchline\": {\"type\": \"string\", \"description\": \"The joke's punchline\"},\n",
    "            },\n",
    "            \"required\": [\"setup\", \"punchline\"],\n",
    "        },\n",
    "    }\n",
    ")\n",
    "\n",
    "await structuredLlm.invoke(\"Tell me a joke about cats\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e28c14d3",
   "metadata": {},
   "source": [
    "If you are using JSON Schema, you can take advantage of other more complex schema descriptions to create a similar effect.\n",
    "\n",
    "You can also use tool calling directly to allow the model to choose between options, if your chosen model supports it. This involves a bit more parsing and setup. See [this how-to guide](/docs/how_to/tool_calling/) for more details."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "39d7a555",
   "metadata": {},
   "source": [
    "### Specifying the output method (Advanced)\n",
    "\n",
    "For models that support more than one means of outputting data, you can specify the preferred one like this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "df0370e3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{\n",
       "  setup: \u001b[32m\"Why don't cats play poker in the jungle?\"\u001b[39m,\n",
       "  punchline: \u001b[32m\"Too many cheetahs!\"\u001b[39m\n",
       "}"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "const structuredLlm = model.withStructuredOutput(joke, {\n",
    "    method: \"json_mode\",\n",
    "    name: \"joke\",\n",
    "})\n",
    "\n",
    "await structuredLlm.invoke(\n",
    "    \"Tell me a joke about cats, respond in JSON with `setup` and `punchline` keys\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5e92a98a",
   "metadata": {},
   "source": [
    "In the above example, we use OpenAI's alternate JSON mode capability along with a more specific prompt.\n",
    "\n",
    "For specifics about the model you choose, peruse its entry in the [API reference pages](https://v02.api.js.langchain.com/).\n",
    "\n",
    "## Prompting techniques\n",
    "\n",
    "You can also prompt models to outputting information in a given format. This approach relies on designing good prompts and then parsing the output of the models. This is the only option for models that don't support `.with_structured_output()` or other built-in approaches.\n",
    "\n",
    "### Using `JsonOutputParser`\n",
    "\n",
    "The following example uses the built-in [`JsonOutputParser`](https://v02.api.js.langchain.com/classes/langchain_core_output_parsers.JsonOutputParser.html) to parse the output of a chat model prompted to match a the given JSON schema. Note that we are adding `format_instructions` directly to the prompt from a method on the parser:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "6e514455",
   "metadata": {},
   "outputs": [],
   "source": [
    "import { JsonOutputParser } from \"@langchain/core/output_parsers\";\n",
    "import { ChatPromptTemplate } from \"@langchain/core/prompts\";\n",
    "\n",
    "type Person = {\n",
    "    name: string;\n",
    "    height_in_meters: number;\n",
    "};\n",
    "\n",
    "type People = {\n",
    "    people: Person[];\n",
    "};\n",
    "\n",
    "const formatInstructions = `Respond only in valid JSON. The JSON object you return should match the following schema:\n",
    "{{ people: [{{ name: \"string\", height_in_meters: \"number\" }}] }}\n",
    "\n",
    "Where people is an array of objects, each with a name and height_in_meters field.\n",
    "`\n",
    "\n",
    "// Set up a parser\n",
    "const parser = new JsonOutputParser<People>();\n",
    "\n",
    "// Prompt\n",
    "const prompt = await ChatPromptTemplate.fromMessages(\n",
    "    [\n",
    "        [\n",
    "            \"system\",\n",
    "            \"Answer the user query. Wrap the output in `json` tags\\n{format_instructions}\",\n",
    "        ],\n",
    "        [\n",
    "            \"human\",\n",
    "            \"{query}\",\n",
    "        ]\n",
    "    ]\n",
    ").partial({\n",
    "    format_instructions: formatInstructions,\n",
    "})"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "082fa166",
   "metadata": {},
   "source": [
    "Let’s take a look at what information is sent to the model:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "3d73d33d",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "System: Answer the user query. Wrap the output in `json` tags\n",
      "Respond only in valid JSON. The JSON object you return should match the following schema:\n",
      "{{ people: [{{ name: \"string\", height_in_meters: \"number\" }}] }}\n",
      "\n",
      "Where people is an array of objects, each with a name and height_in_meters field.\n",
      "\n",
      "Human: Anna is 23 years old and she is 6 feet tall\n"
     ]
    }
   ],
   "source": [
    "const query = \"Anna is 23 years old and she is 6 feet tall\"\n",
    "\n",
    "console.log((await prompt.format({ query })).toString())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "081956b9",
   "metadata": {},
   "source": [
    "And now let's invoke it:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "8d6b3d17",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{ people: [ { name: \u001b[32m\"Anna\"\u001b[39m, height_in_meters: \u001b[33m1.83\u001b[39m } ] }"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "const chain = prompt.pipe(model).pipe(parser);\n",
    "\n",
    "await chain.invoke({ query })"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6732dd87",
   "metadata": {},
   "source": [
    "For a deeper dive into using output parsers with prompting techniques for structured output, see [this guide](/docs/how_to/output_parser_structured).\n",
    "\n",
    "### Custom Parsing\n",
    "\n",
    "You can also create a custom prompt and parser with [LangChain Expression Language (LCEL)](/docs/concepts/#langchain-expression-language), using a plain function to parse the output from the model:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "525721b3",
   "metadata": {},
   "outputs": [],
   "source": [
    "import { AIMessage } from \"@langchain/core/messages\";\n",
    "import { ChatPromptTemplate } from \"@langchain/core/prompts\";\n",
    "\n",
    "type Person = {\n",
    "    name: string;\n",
    "    height_in_meters: number;\n",
    "};\n",
    "\n",
    "type People = {\n",
    "    people: Person[];\n",
    "};\n",
    "\n",
    "const schema = `{{ people: [{{ name: \"string\", height_in_meters: \"number\" }}] }}`\n",
    "\n",
    "// Prompt\n",
    "const prompt = await ChatPromptTemplate.fromMessages(\n",
    "    [\n",
    "        [\n",
    "            \"system\",\n",
    "            `Answer the user query. Output your answer as JSON that\n",
    "matches the given schema: \\`\\`\\`json\\n{schema}\\n\\`\\`\\`.\n",
    "Make sure to wrap the answer in \\`\\`\\`json and \\`\\`\\` tags`\n",
    "        ],\n",
    "        [\n",
    "            \"human\",\n",
    "            \"{query}\",\n",
    "        ]\n",
    "    ]\n",
    ").partial({\n",
    "    schema\n",
    "});\n",
    "\n",
    "/**\n",
    " * Custom extractor\n",
    " * \n",
    " * Extracts JSON content from a string where\n",
    " * JSON is embedded between ```json and ``` tags.\n",
    " */\n",
    "const extractJson = (output: AIMessage): Array<People> => {\n",
    "    const text = output.content as string;\n",
    "    // Define the regular expression pattern to match JSON blocks\n",
    "    const pattern = /```json(.*?)```/gs;\n",
    "\n",
    "    // Find all non-overlapping matches of the pattern in the string\n",
    "    const matches = text.match(pattern);\n",
    "\n",
    "    // Process each match, attempting to parse it as JSON\n",
    "    try {\n",
    "        return matches?.map(match => {\n",
    "            // Remove the markdown code block syntax to isolate the JSON string\n",
    "            const jsonStr = match.replace(/```json|```/g, '').trim();\n",
    "            return JSON.parse(jsonStr);\n",
    "        }) ?? [];\n",
    "    } catch (error) {\n",
    "        throw new Error(`Failed to parse: ${output}`);\n",
    "    }\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9f1bc8f7",
   "metadata": {},
   "source": [
    "Here is the prompt sent to the model:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "c8a30d0e",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "System: Answer the user query. Output your answer as JSON that\n",
      "matches the given schema: ```json\n",
      "{{ people: [{{ name: \"string\", height_in_meters: \"number\" }}] }}\n",
      "```.\n",
      "Make sure to wrap the answer in ```json and ``` tags\n",
      "Human: Anna is 23 years old and she is 6 feet tall\n"
     ]
    }
   ],
   "source": [
    "const query = \"Anna is 23 years old and she is 6 feet tall\"\n",
    "\n",
    "console.log((await prompt.format({ query })).toString())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ec018893",
   "metadata": {},
   "source": [
    "And here's what it looks like when we invoke it:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "e1e7baf6",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[\n",
       "  { people: [ { name: \u001b[32m\"Anna\"\u001b[39m, height_in_meters: \u001b[33m1.83\u001b[39m } ] }\n",
       "]"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import { RunnableLambda } from \"@langchain/core/runnables\";\n",
    "\n",
    "const chain = prompt.pipe(model).pipe(new RunnableLambda({ func: extractJson }));\n",
    "\n",
    "await chain.invoke({ query })"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7a39221a",
   "metadata": {},
   "source": [
    "## Next steps\n",
    "\n",
    "Now you've learned a few methods to make a model output structured data.\n",
    "\n",
    "To learn more, check out the other how-to guides in this section, or the conceptual guide on tool calling."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Deno",
   "language": "typescript",
   "name": "deno"
  },
  "language_info": {
   "file_extension": ".ts",
   "mimetype": "text/x.typescript",
   "name": "typescript",
   "nb_converter": "script",
   "pygments_lexer": "typescript",
   "version": "5.3.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
